AITopics | causal language model

Collaborating Authors

causal language model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Simple Language Model for Task-Oriented Dialogue

Neural Information Processing SystemsDec-24-2025, 20:12:40 GMT

Task-oriented dialogue is often decomposed into three tasks: understanding user input, deciding actions, and generating a response. While such decomposition might suggest a dedicated model for each sub-task, we find a simple, unified approach leads to state-of-the-art performance on the MultiWOZ dataset. SimpleTOD is a simple approach to task-oriented dialogue that uses a single, causal language model trained on all sub-tasks recast as a single sequence prediction problem. This allows SimpleTOD to fully leverage transfer learning from pre-trained, open domain, causal language models such as GPT-2. SimpleTOD improves over the prior state-of-the-art in joint goal accuracy for dialogue state tracking, and our analysis reveals robustness to noisy annotations in this setting. SimpleTOD also improves the main metrics used to evaluate action decisions and response generation in an end-to-end setting: inform rate by 8.1 points, success rate by 9.7 points, and combined score by 7.2 points.

language model, name change, simple language model, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Fluent but Unfeeling: The Emotional Blind Spots of Language Models

Shu, Bangzhao, Joshi, Isha, Karnaze, Melissa, Pham, Anh C., Kakkar, Ishita, Kothe, Sindhu, Hovasapian, Arpine, ElSherief, Mai

arXiv.org Artificial IntelligenceSep-12-2025

The versatility of Large Language Models (LLMs) in natural language understanding has made them increasingly popular in mental health research. While many studies explore LLMs' capabilities in emotion recognition, a critical gap remains in evaluating whether LLMs align with human emotions at a fine-grained level. Existing research typically focuses on classifying emotions into predefined, limited categories, overlooking more nuanced expressions. To address this gap, we introduce EXPRESS, a benchmark dataset curated from Reddit communities featuring 251 fine-grained, self-disclosed emotion labels. Our comprehensive evaluation framework examines predicted emotion terms and decomposes them into eight basic emotions using established emotion theories, enabling a fine-grained comparison. Systematic testing of prevalent LLMs under various prompt settings reveals that accurately predicting emotions that align with human self-disclosed emotions remains challenging. Qualitative analysis further shows that while certain LLMs generate emotion terms consistent with established emotion theories and definitions, they sometimes fail to capture contextual cues as effectively as human self-disclosures. These findings highlight the limitations of LLMs in fine-grained emotion alignment and offer insights for future research aimed at enhancing their contextual understanding.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.09593

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Industry:

Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Non-Markovian Discrete Diffusion with Causal Language Models

Zhang, Yangtian, He, Sizhuang, Levine, Daniel, Zhao, Lawrence, Zhang, David, Rizvi, Syed A, Zappala, Emanuele, Ying, Rex, van Dijk, David

arXiv.org Artificial IntelligenceFeb-13-2025

Discrete diffusion models have emerged as a flexible and controllable paradigm for structured sequence modeling, yet they still lag behind causal language models in expressiveness. To bridge the gap between two paradigms, we introduce CaDDi, a causal discrete diffusion model that unifies sequential and temporal modeling within a non-Markovian diffusion framework. Unlike conventional diffusion models that operate step by step with no access to prior states, CaDDi integrates the temporal trajectory, enabling more expressive and controllable generation. Our approach also treats causal language models as a special case, allowing seamless adoption of pretrained large language models (LLMs) for discrete diffusion without the need for architectural modifications. Empirically, we demonstrate that CaDDi outperforms state-of-the-art discrete diffusion models on both natural language and biological sequence tasks, narrowing the gap between diffusion-based methods and large-scale autoregressive transformers.

large language model, natural language, non-markovian discrete diffusion, (2 more...)

arXiv.org Artificial Intelligence

2502.09767

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)

Add feedback

COAST: Intelligent Time-Adaptive Neural Operators

Wu, Zhikai, Zhang, Shiyang, He, Sizhuang, Wang, Sifan, Zhu, Min, Jiao, Anran, Lu, Lu, van Dijk, David

arXiv.org Artificial IntelligenceFeb-12-2025

We introduce Causal Operator with Adaptive Solver Transformer (COAST), a novel neural operator learning method that leverages a causal language model (CLM) framework to dynamically adapt time steps. Our method predicts both the evolution of a system and its optimal time step, intelligently balancing computational efficiency and accuracy. We find that COAST generates variable step sizes that correlate with the underlying system intrinsicities, both within and across dynamical systems. Within a single trajectory, smaller steps are taken in regions of high complexity, while larger steps are employed in simpler regions. Across different systems, more complex dynamics receive more granular time steps. Benchmarked on diverse systems with varied dynamics, COAST consistently outperforms state-of-the-art methods, achieving superior performance in both efficiency and accuracy. This work underscores the potential of CLM-based intelligent adaptive solvers for scalable operator learning of dynamical systems.

coast, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.08574

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.89)

Add feedback

A Simple Language Model for Task-Oriented Dialogue

Neural Information Processing SystemsJan-13-2025, 14:12:16 GMT

causal language model, simple language model, task-oriented dialogue, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

CaLMFlow: Volterra Flow Matching using Causal Language Models

He, Sizhuang, Levine, Daniel, Vrkic, Ivan, Bressana, Marco Francesco, Zhang, David, Rizvi, Syed Asad, Zhang, Yangtian, Zappala, Emanuele, van Dijk, David

arXiv.org Artificial IntelligenceOct-3-2024

We introduce CaLMFlow (Causal Language Models for Flow Matching), a novel framework that casts flow matching as a Volterra integral equation (VIE), leveraging the power of large language models (LLMs) for continuous data generation. CaLMFlow enables the direct application of LLMs to learn complex flows by formulating flow matching as a sequence modeling task, bridging discrete language modeling and continuous generative modeling. Our method implements tokenization across space and time, thereby solving a VIE over these domains. This approach enables efficient handling of high-dimensional data and outperforms ODE solver-dependent methods like conditional flow matching (CFM). We demonstrate CaLMFlow's effectiveness on synthetic and real-world data, including single-cell perturbation response prediction, showcasing its ability to incorporate textual context and generalize to unseen conditions. Our results highlight LLM-driven flow matching as a promising paradigm in generative modeling, offering improved scalability, flexibility, and context-awareness. Recent advances in deep learning have revolutionized generative modeling for complex, highdimensional data. In particular, methods based on ordinary differential equations (ODEs), such as continuous normalizing flows (CNFs) (Chen et al., 2018) and flow matching (Lipman et al., 2022), have emerged as efficient tools for modeling continuous data distributions. However, many ODE systems suffer from stiffness making them numerically unstable and computationally expensive to solve accurately (Kushnir & Rokhlin, 2012; Zappala et al., 2024). Recent work in operator learning (Xiong et al., 2021; Cao, 2021; Zappala et al., 2024) has also connected solving integral equations with transformers, the foundational architecture of large language models (LLMs), inspiring the use of LLMs to model dynamical systems through the lens of IEs.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.05292

Country:

Europe > Netherlands > South Holland > Leiden (0.05)
North America > United States > Idaho (0.04)
Europe > Middle East > Malta > Eastern Region > Northern Harbour District > St. Julian's (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

BERTs are Generative In-Context Learners

Samuel, David

arXiv.org Artificial IntelligenceJun-7-2024

This paper explores the in-context learning capabilities of masked language models, challenging the common view that this ability does not 'emerge' in them. We present an embarrassingly simple inference technique that enables DeBERTa to operate as a generative model without any additional training. Our findings demonstrate that DeBERTa can match and even surpass GPT-3, its contemporary that famously introduced the paradigm of in-context learning. The comparative analysis reveals that the masked and causal language models behave very differently, as they clearly outperform each other on different categories of tasks. This suggests that there is great potential for a hybrid training approach that takes advantage of the strengths of both training objectives.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2406.04823

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Norway > Eastern Norway > Oslo (0.04)
Europe > Germany > Berlin (0.04)
(14 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education > Curriculum > Subject-Specific Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

ALMs: Authorial Language Models for Authorship Attribution

Huang, Weihang, Murakami, Akira, Grieve, Jack

arXiv.org Artificial IntelligenceJan-22-2024

In this paper, we introduce an authorship attribution method called Authorial Language Models (ALMs) that involves identifying the most likely author of a questioned document based on the perplexity of the questioned document calculated for a set of causal language models fine-tuned on the writings of a set of candidate author. We benchmarked ALMs against state-of-art-systems using the CCAT50 dataset and the Blogs50 datasets. We find that ALMs achieves a macro-average accuracy score of 83.6% on Blogs50, outperforming all other methods, and 74.9% on CCAT50, matching the performance of the best method. To assess the performance of ALMs on shorter texts, we also conducted text ablation testing. We found that to reach a macro-average accuracy of 70%, ALMs needs 40 tokens on Blogs50 and 400 tokens on CCAT50, while to reach 60% ALMs requires 20 tokens on Blogs50 and 70 tokens on CCAT50.

authorship attribution, dataset, perplexity, (12 more...)

arXiv.org Artificial Intelligence

2401.12005

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > Dominican Republic (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.52)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.48)

Add feedback

Are We Falling in a Middle-Intelligence Trap? An Analysis and Mitigation of the Reversal Curse

Lv, Ang, Zhang, Kaiyi, Xie, Shufang, Tu, Quan, Chen, Yuhan, Wen, Ji-Rong, Yan, Rui

arXiv.org Artificial IntelligenceNov-16-2023

Recent studies have highlighted a phenomenon in large language models (LLMs) known as "the reversal curse," in which the order of knowledge entities in the training data biases the models' comprehension. For example, if a model is trained on sentences where entity A consistently appears before entity B, it can respond to queries about A by providing B as the answer. However, it may encounter confusion when presented with questions concerning B. We contend that the reversal curse is partially a result of specific model training objectives, particularly evident in the prevalent use of the next-token prediction within most causal language models. For the next-token prediction, models solely focus on a token's preceding context, resulting in a restricted comprehension of the input. In contrast, we illustrate that the GLM, trained using the autoregressive blank infilling objective where tokens to be predicted have access to the entire context, exhibits better resilience against the reversal curse. We propose a novel training method, BIdirectional Casual language modeling Optimization (BICO), designed to mitigate the reversal curse when fine-tuning pretrained causal language models on new data. BICO modifies the causal attention mechanism to function bidirectionally and employs a mask denoising optimization. In the task designed to assess the reversal curse, our approach improves Llama's accuracy from the original 0% to around 70%. We hope that more attention can be focused on exploring and addressing these inherent weaknesses of the current LLMs, in order to achieve a higher level of intelligence.

causal language model, language model, reversal curse, (14 more...)

arXiv.org Artificial Intelligence

2311.07468

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Towards Better Few-Shot and Finetuning Performance with Forgetful Causal Language Models

Liu, Hao, Geng, Xinyang, Lee, Lisa, Mordatch, Igor, Levine, Sergey, Narang, Sharan, Abbeel, Pieter

arXiv.org Artificial IntelligenceJan-31-2023

Large language models (LLM) trained using the next-token-prediction objective, such as GPT3 and PaLM, have revolutionized natural language processing in recent years by showing impressive zero-shot and few-shot capabilities across a wide range of tasks. In this work, we propose a simple technique that significantly boosts the performance of LLMs without adding computational cost. Our key observation is that, by performing the next token prediction task with randomly selected past tokens masked out, we can improve the quality of the learned representations for downstream language understanding tasks. We hypothesize that randomly masking past tokens prevents over-attending to recent tokens and encourages attention to tokens in the distant past. We find that our method, Forgetful Causal Masking (FCM), significantly improves both few-shot and finetuning performance of PaLM. We further consider a simple extension, T-FCM, which introduces bidirectional context to causal language model without altering the sequence order, and further improves finetuning performance.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2210.13432

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
South America > Colombia > Meta Department > Villavicencio (0.04)
Europe > Ireland (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback